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SECTION  I 
INTRODUCTION 

Discrimination  analysis  has  been  developed  through 
broad  phases  in  much  the  same  manner  as  the  general 
history  of  statistical  inference.   There  have  been  the 
Pearsonian  phase  with  the  introduction  of  the  coef- 
ficient of  racial  likeness,  the  Fisherian  phase  con- 
nected with  the  linear  discriminant  function,  the 
Neyman-Pearson  phase  with  the  introduction  of  the 
notions  of  risk  and  minimax,  and  the  contemporary 
Waldian  phase.   Although  the  coefficient  of  racial 
likeness  and  generalized  distance,  proposed  by  Karl 
Pearson  and  P.  C.  Mahalanobis,  respectively  are  sta- 
tistics to  test  the  hypothesis  of  homogeneity,  these 
statistics  were  the  predecessors  of  discriminatory 
techniques.   It  was  not  until  the  middle  1930's  that 
R.  A.  Fisher  presented  the  first  clear  statement  of 
the  problem  of  discrimination  and  the  first  proposed 
solution  to  the  problem.   An  excellent  survey  of  the 
literature  on  discriminatory  analysis  and  related  topics 
has  been  compiled  by  J.  L.  Hodges  in  [4]. 

The  general  discrimination  problem  may  be  classi- 
fied into  three  principal  types  as  follows: 

( 1 ) .   A  Finite  Number  of  Known  Distributions  - 
Let   X   be  a  random  variable  which  is  known  to  be  dis- 
tributed according  to  one  of  a  finite  number  of 
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distributions  with  known  density  functions,  f  .(x), 
j  =  1,  .  ..,  m.   On  the  basis  of  an  observation  on  X, 
the  problem  is  to  determine  which  one  of  the   m   known 
distributions  is  the  distribution  of   X. 

(2).   Finite  Number  of  Parametric  Families  of 
Distributions  -  Let   X   be  a  random  variable  which  is 
known  to  have  a  distribution  in  one  of  a  finite  number 
of  families  of  distributions.   The  distributions  in 
the  j-th  family  have  density  functions,  f  .(x,  cp . ) ,  of 
known  form  which  depend  upon  the  parameter   cp .   which 
lie  in  a  parameter  space  Q.,    j  =  1,  . ..,  m.   On  the 
basis  of  an  observation  on  X,  the  problem  is  to  deter- 
mine which  one  of  the   j   families  of  distributions  is 
the  distribution  of   X. 

(3).   Nonparametric  -  Let   T   be  an  individual 
which  is  known  to  belong  to  one  of  a  finite  number  of 
populations,  tr . ,  j  =  1,  ...,  m.   To  each  individual 
there  corresponds  an  observable  value  of  a  random 
variable  which  could  be  vector-valued.   On  the  basis 
of  a  random  sample  of  n.  individuals  from  population 
IT.,  j  =  1,  ...,  m,  the  problem  is  to  decide  which  one 
of  the  m  populations  contains  the  individual  T  as  a 
member . 

It  may  be  that  the  only  observation  available  is 
the  observation  on  the  random  variable,  X,  to  be 
classified,  but  usually,  there  are,  in  addition  to  the 


observation  to  be  classified,  other  observations 
available  which  can  be  used  to  estimate  the  distri- 
butions to  which  X   is  to  be  assigned. 

The  nonparamet ric  type  of  discrimination  problem 
has  received  least  attention  to  date.   In  [2],  Hodges 
and  Fix  have  considered  the  problem  of  nonparametric 
classification  in  the  case  of  two  populations  and  have 
developed  procedures  which  were  shown  to  have  asymp- 
totic optimum  properties  for  large  samples.   In  [3], 
Hodges  and  Fix  compared  several  of  these  nonparametric 
procedures  against  the  linear  discriminant  function 
when  the  two  populations  are  normal  with  equal  covari- 
ance  matrices.   The  linear  discriminant  function  is  a 
widely  employed  classification  procedure,  and  therefore, 
it  is  of  interest  to  determine  the  performance  of  this 
procedure  when  the  populations  are  not  gaussian.   In 
[11,  Thomas  £.  Eaton  compared  one  of  the  nonparametric 
procedures  proposed  in  [2]  against  the  linear  discri- 
minant function  when  the  two  populations  were  exponen- 
tial.   The  basis  of  comparison  in  both  [ll  and  [3] 
was  the  probability  of  misc lassif ication .   This  thesis 
is  a  continuation  of  the  research  started  in  [l]. 

Section  II  will  summarize  the  procedures  and  re- 
sults of  [31  as  all  of  the  procedures  used  in  this 
paper  are  analogous.   Section  III  provides  a  complete 
comparison  of  the  probabilities  of  misc lassif ication 


of  a  nonparamet ric  procedure  against  the  linear  dis- 
criminant function  when  the  two  populations  are  exponen- 
tial.  Section  III  also  includes  a  limited  tabulation 
of  the  probabilities  of  misclassif ication  for  the  linear 
discriminant  function  when  the  two  populations  are  gamma 
and  one  of  the  parameters  has  its  domain  restricted  to 
the  positive  integers.   Due  to  time  limitation,  it  was 
not  possible  to  determine  a  satisfactory  computational 
formula  to  compute  the  probabilities  of  misclassif ica- 
tion for  the  nonparamet ric  procedure  when  the  two  popu- 
lations are  gamma.   Section  IV  contains  conclusions 
and  recommendations  based  on  the  results  obtained  in 
Section  III. 

I  am  indebted  to  Professor  J.  R.  Borsting  for  his 
encouragement  and  most  capable  guidance  and  advifie 
while  acting  as  faculty  advisor,  and  wish  to  thank 
Professor  R.  R.  Read  for  his  valuable  assistance  and 
advice  as  second  reader.   Also,  I  wish  to  thank  and 
acknowledge  Mrs.  Patricia  Johnson  for  programming  the 
procedures  developed  in  Section  III  of  this  thesis. 


SECTION  II 

PERFORMANCE  OF  THE  LINEAR  DISCRIMINANT 

FUNCTION  AND  A  NONPARAMETRIC  DISCRIMINATOR 

WHEN  THE  TWO  POPULATIONS  HAVE  NORMAL 

DISTRIBUTIONS  WITH  EQUAL  COVARIANCE  MATRICES 


Let  Xx,  X2 ,  ...,  X   and  Yx ,  Y2 ,  . ..,  Y  be  samples 
from  the  p-variate  distributions  F  and  G,  respectively, 
and  let  Z  be  an  observation  known  to  be  from  either  F 
or  from  G;  on  what  basis  is  it  decided  to  which  popu- 
lation Z  belongs?   When  F  and  G  are  p-variate  normal 
distributions  with  equal   covariance  matrices,  the 
linear  discriminant  function  is  known  to  be  an  approp- 
riate procedure.   But  what  is  a  reasonable  procedure 
when  the  parametric  forms  of  F  and  G  are  not  known? 

In  [2],  Hodges  and  Fix  suggest,  as  an  intuitive 
approach,  the  following  nonparametric  procedure:   De- 
fine in  p-dimensional  space  a  notion  of  distance  which 
will  permit  a  ranking  of  the  2n  observations  according 
to  their  nearness  to  Z.   Then  select  an  odd  integer, 
k,  and  assign  Z  to  that  distribution  from  which  came 
the  majority  of  the  k  nearest  observations.   Several 
classes  of  these   nonparametric  discriminators  are 
shown  to  have  asymptotically  optimum  performance  in 
the  sense  that  the  probabilities  of  misc lassif ication, 

Px  =  PTz  is  assigned  to  G |Z  came  from  F] 

P2  =  P[Z  is  assigned  to  F |Z  came  from  G] 
tend,  as  n  tends  to  infinity,  to  the  theoretical 


minimum  values  if  F  and  G  were  completely  known. 
Since  it  would  not  be  reasonable  to  employ  a  non- 
parametric  procedure  solely  on  the  basis  of  asymptotic 
properties  and  applicational  simplicity,  an  investi- 
gation is  made  in  [3]  to  determine  how  much  discrimi- 
nating power  is  lost  through  the  use  of  a  nonparametric 
discriminator  when  samples  are  small.   To  this  end, 
Hodges  and  Fix  assume  that  F  and  G  are  normal  with 
equal  covariance  matrices  so  that  the  linear  discri- 
minant function  is  appropriate.   Then  a  comparison  of 
the  probabilities  of  misclassif ication,  Px  and  P2  , 
which  result  when  the  linear  discriminant  function  is 
employed  with  the  corresponding  probabilities  Px  and  P2 
obtained  when  an  alternate  nonparametric  discrimination 
procedure  is  used,  indicates  how  much  discriminating 
power  is  lost  when  sample  sizes  are  small.   The  remain- 
der of  this  Section  is  devoted  to  summarizing  some  of 
the  procedures  and  results  of  [3]. 

The  principal  distance  function  compared  with  the 
linear  discriminant  function  is 

P 
(1).    A  (x,z)  =  Max  |x.  -  z. | 

i=l    1    1 

although   A   is  just  one  of  a  large  class  of  distance 
functions,  anyone  of  which  could  be  used.   This  fact 
is  mentioned  since  the  probabilities  of  error,  Pt  and 
P2 ,  depend  very  heavily  on  the  distance  function 


employed.   Also,  a  great  part  of  the  computations  are 
made  using  k  =  1,  that  is,  assign  Z  to  the  population 
F  or  G  from  which  came  the  individual  of  the  pooled 
samples  which  most  closely  resembles  Z.   This  case  will 
be  denoted  the  rule  of  the  "nearest  neighbor .  " 

By  considering  linear  transformations  on  the  ob- 
servation space,  the  problem  can  be  reduced  considerably 
since  it  is  always  possible  by  such  transformations  to 
ensure  F  and  G  will  have  the  identity  covariance  matrix. 
Thus,  the  p  transformed  measurements  have  unit  variance 
and  are  independent  in  each  population.   Also,  it  is 
possible  by  such  transformations  to  place  the  expecta- 
tion vector  of  F  at  the  origin  and  the  expectation 
vector  of  G  on  the  positive  first  axis.   In  performing 
such  linear  transformations,  the  probabilities  of 
misclassif ication,  Px  and  P2 ,  are  unchanged  for  both 
the  nonparametric  procedure  and  linear  discriminant 
function.   Thus  without  loss  of  generality,  it  is 
sufficient  to  consider  the  transformed  populations 
with  the  two  parameters,  p  and  A,  where 
K    =  E(first  coordinate  of  Y) 

=  distance  between  the  means  of  the 
transformed  populations . 

Furthermore,  from  the  symmetry  of  the  problem  it  is 

evident  that  Px  =  P2  for  both  procedures;  consequently, 

it  is  sufficient  to  compute  Plv  that  is,  assume  Z  is 

distributed  according  to  F. 


For  the  univariate  case,  p  =  1,  the  linear  dis- 
criminant function  is  greatly  simplified  since  no 
matrix  computation  occurs.   The  procedure  consists 
simply  of  computing  the  arithmetic  mean  of  the  sample 

means  , 

X  +  Y 
2 

and  assigning  Z  to  that  population  whose  sample  mean 
lies  on  the  side  of  (X  +  Y)/2  as  does  Z  itself.   The 
probabilities  of  misclassif ication  are  readily  com- 
puted by  introducing  two  new  variables  which  are  func- 
tions of  X,  Y,  and  Z.   The  exact  procedure  is  outlined 
in  [3],  but  not  included  in  this  summary  since  the 
subsequent  investigation  does  not  depend  upon  this 
technique.   Table  1  provides  a  tabulation  of  values 
of  Pi  =  P2  for  various  values  of  n  and  \.      All  tables 
in  this  section  have  been  reproduced  from  [3]. 

For  p  =  1,  the  distance  function  A  corresponds 
to  ordinary  Euclidean  distance  and  the  nonparametric 
procedure  using  the  "rule  of  the  nearest  neighbor," 
k  =  1,  consists  of  assigning  Z  to  that  population  from 
which  came  the  sample  individual  nearest  to  Z.   The 
probability,  Pls  that  the  nearest  neighbor  to  Z  is  one 
of  the  Y's,  given  that  Z  is  distributed  as  X,  is  readily 
computed  using  the  following  technique.   Define  Px(z) 
to  be  the  conditional  probability  that  the  nearest  of 
the  2n  sample  observations  to  Z  is  a  Y,  given  that 


where  f  is  the  density  function  corresponding  to  F. 
Continuing  exactly  as  in  [  3  }  9  it  remains  only  to 
calculate  Px(z).   The  event,  "the  nearest  sample  value 
to  z  is  a  Y"  may  be  classified  into  n  exclusive  events, 
"the  nearest  sample  value  to  z  is  Y.  , "  i  =  1 ,  2,  ....,  n 
where    the  JY.  -  z|  are  independent  identically 
distributed  random  variables.   By  defining 


and 


H  (S)  =  P(|X  -  zl  <   S) 

z         '      ' 


Kz(5)  =  P(|Y  -  z|  <  8), 


it  is  readily  shown  that  the  density  function  for  the 
minimum  of  the  |Y-  -  z|,   i  =  1,  2,  . ..,  n  is 

n  [1  -  K   (8)]n_1   dK  (5) 
z  z 

and  that  Pi(z)  can  be  computed  by  the  formula 

(3).   Px(z)  =  n  I   [1-Hz(5)]n  [1-Kz(5)]n_1  dKz(S) 

o 
Formulae  (2)  and  (3)  form  the  basis  for  all  the  compu- 
tations for  the  "nearest  neighbor  rule"  for  any  p.. 
Tables  2  and  2A  provide  a  tabulation  of  P x  =  P2  for  the 
nonparametric  discriminator,  k  =  1,  for  various  values 
of  n  and  k . 


It   was    shown   in   [  3  J    that    for    large    n, 


(4).       Px    =    E 


g(z) 


(z)f (z)dz 
(z)+g(z) 


f(z)+g(z) 

The  above  formula  was  obtained  from  an  expansion  of 
formula  (3)  and  is  quite  general.   An  application  of 
Schwartz's  inequality  to  formula  (4),  shows  the 
integral  can  not  exceed  ^r    . 

Also  investigated  in  [  3  ]  are  the  following  addi- 
tional cases: 

(i)   A  nonparamet ric  procedure  using  A  as  a  dis- 
tance function  with  k  >  2  for  the  univariate  and  bi- 
variate  normal  distributions. 

(ii)  A  nonparamet ric  procedure  using  A  as  a 
distance  function  with  k  =  1,  n  =  1,  and  p  >  2. 

(iii)  The  effect  of  other  distance  functions  on 
the  probabilities  of  misc lassif ication  for  the  bi- 
variate  normal  distribution. 

Due  to  laborious  computations,  the  investigation 
of  several  of  the  above  cases  was  quite  limited,  but 
the  results  that  were  obtained  indicate  that  the  non- 
parametric  procedures  gave  "reasonable"  error  proba- 
bilities in  cases  (i)  and  (ii).   Although  for  the 
bivariate  normal  distribution,  different  distance 
functions  produced  vastly  different  error  probabilities 
in  some  instances. 
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TABLE  1 

PROBABILITY  OP  ERROR,  LINEAR  DISCRIMINANT  FUNCTION, 
UNIVARIATE  NORMAL  DISTRIBUTIONS 


n 

A=i 

A=2 

A  =3 

1 

.4175 

.2532 

.1235 

2 

•  3821 

.1999 

.0910 

3 

.3611 

.1819 

.0826 

k 

•  3V72 

.1744 

..0787  • 

•   5 

.3376 

.1707 

.0763 

10 

.3175 

o  16^.6 

.0716 

20 

o3H0 

d6l6 

.0692 

50 

•3094 

.1599 

.0678 

00 

.3085 

.1587 

.0668 

n  =  size  of  sample  taken  from  each  population 
A  B  distance  between  the  means  of  the  two  populations 
Probability  of  error  =  P  (Z  is  assigned  to  G  |  Z  came  from  P) 
=  P  (Z  is  assigned  to  F  I  Z  came  from  :. 
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TABLE  2 
PROBABILITY  OP  ERROR,  NONPARAMETRIC  DISCRIMINATOR 
WITH  k=l,  UNIVARIATE  NORMAL  DISTRIBUTION 


n 


A  =1  A  =2  A  =3 

1  01A75  o2532  ol235 

2  o  ij.086  e236l4.  ,1084 

3  o^052  o2307  0IO36 
k.                         ok.032.  o2280  .101^ 


TABLE  2-A 
APPROXIMATE  PROBABILITY  OP  ERROR,  NONPARAMETRIC 
DISCRIMINATOR  WITH  k=ls  UNIVARIATE  NORMAL  DISTRIBUTION 


n 

A-i 

'  1*. 

.1*03 

5 

.1^01 

10 

o399 

20 

o398 

So 

o398 

00 

o398 

A=2  A  =3 

o226  .102 

o22£  olOO 

o223  o098 

o22ij.  O098 

c225  o098 

o225  o098 


n  ■  size  of  sample  from  each  population 
A  =  distance  between  the  means  of  the  two  populations 
Probability  of  error  ■  P(Z  is  assigned  to  0  |  Z  came  from  P) 
■  P(Z  is  assigned  to  P  |  Z  came  from  G) 
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SECTION  III 

PERFORMANCE  OF  THE  LINEAR  DISCRIMINANT 
FUNCTION  AND  THE  "RULE  OF  NEAREST  NEIGHBOR" 
WHEN  THE  TWO  POPULATIONS  HAVE  GAMMA  DISTRIBUTIONS 


The  validity  of  the  linear  discriminant  function 
when  the  data  is  obviously  not  normal  has  been  of 
great  concern  to  many  users  and  also  potential  users 
of  this  discrimination  procedure.   In  [1  J,  T.  E. 
Eaton  investigated  the  performance  of  the  linear  dis- 
criminant function  and  a  nonparametric  procedure  for 
sample  size  one  and  two  when  the  univariate  distribu- 
tions, F  and  G,  are  assumed  to  be  exponential  with 
parameters  X    and  \i    respectively.   This  investigation 
was  performed  by  computing  the  probabilities  of  mis- 
classification.   The  results  of  this  study  showed 
that  both  the  linear  discriminant  function  and  non- 
parametric  discriminator  using  a  as  a  distance  func- 
tion and  "the  rule  of  nearest  neighbor"  can  give  high 
probabilities  of  misclassif ication  for  sample  size 
one  and  two.   In  this  section,  the  investigation 
started  in  [  1]  is  continued  in  order  to  provide  a 
limited  indication  of  how  much  discriminating  power 
the  linear  discriminant  function  and  "rule  of  nearest 
neighbor"  have  when  the  populations  are  not  normal. 

The  scope  of  the  present  study  is  an  investiga- 
tion of  the  probabilities  of  misclassif ication , 
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Px  =  P  [Z  is  assigned  to  G| Z  came  from  F] 

P2  =  P  [Z  is  assigned  to  FjZ  came  from  Gj  , 
for  the  two  population  classification  problem  when 
the  following  two  procedures  are  employed: 

(i)   The  nonparametric  procedure  employing  A  as 
a  distance  function  and  using  the  "rule  of  the  nearest 
neighbor,"  k  =  1,  when  F  and  G  are  exponentially  dis- 
tributed with  parameters  A  and  |j.  ,  respectively,  and 
X    -    c\i   where  c  is  greater  than  zero, 

(ii)  The  linear  discriminant  function  when  F  and 
G  have  gamma  distributions  with  parameters  (r,  A.)  and 
(r,  |a)  respectively,  where  r  is  a  positive  integer, 
and,  as  above,  A  =  cjjl  where  c  is  greater  than  zero. 

The  density  functions  of  F  and  G  will  be  denoted 
by  f(x;r,C(a)  and  g(y;r,|j)  respectively  where 

(5).    f(x;r,c|i)  =  — ^ exp(-c\ix) 

T(r) 

and 

r   r-1 

(6).   g(y;r,ji)  =  ^ L_    exp(-ny) 

T(r) 

Obviously,  when  r  =  1  in  formula  (5)  and  (6)  above, 

f(x;  1,  c|j)  and  g(y;  19  \±)    are    exponential. 

A  computation  formula  for  the  error  probabilities, 

Px  and  P2 ,  will  be  developed  first  for  the  "rule  of 

nearest  neighbor,"  procedure  (i)  above.   This  procedure 

consists  of  assigning  Z  to  that  population  from  which 

came  the  sample  individual  nearest  to  Z> 
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Assuming  equal  samples,  say  n,  are  available  from 
each  population,  it  is  observed  that  the  folowing 
relation, 

(7).   P1(n,c)  =  P2(n,l/c) 
exists  between  the  error  probabilities  when  F  and  G 
have  gamma  distributions  with  density  functions  defined 
by  formulas  (5)  and  (6);  hence,  this  relationship  exists 
when  F  and  G  are  exponential .   Using  exactly  the  same 
technique  as  was  outlined  in  Section  II,  it  is  ob- 
served that  if  Z  =  z,  and  6  >  0,  then 

"z  +  6 


/ 


H  (6)  =  P(|X-z|  <  5)  =  { 


f  (x;  r  ,c|i)dx,  if   6  >.   z 

z  +  6 
f(x;rvcu)dx,  if   6  <.   z 

'z-6 

z+6 

g(y;r,|i)dy,  if  6  >  z 


K  (6)  =  P( I Y-zl  < 6)  = 
z         '    ' 


z+6 

g(y;r,|a)dy,  if   6  <  z 

'z-6 
It  follows  from  formulas  (2)  and  (3)  of  Section  II 


that 


P1(n,c)  =  n   f(z;r,cu)dz  /  [l~Hz(6)]n  [  1-Kz  (  6  )  ]n~ L    d^ib) 

Jo      r">  Jz   rZ 

+  n  I  f(z;r,cu)dz  /  fl-Hz(6)jn  Cl-Kz(5)]n_1  dKz(6). 

Hence,  by  the  simple  change  of  variables,  6'  =  c6,  z'  = 
cz,  y'  =  cy  and  x'  =  ex,  it  follows  that 
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Px(n,c)    n  J  f(z;r,u)dz  /  Cl-Hz(5)]n  [l-Kz(S)jn-1  dKz(&) 


+  n  /  f(z;r,fi)dz  I  ri-Hz(S)]n  [l-^CS)]""1  dKz(S) 

o  'o 

=  P2(n,l/c) 

Unfortunately,  it  was  only  possible  to  determine  a  suit- 
able computational  formula  for  P^njc)  when  F  and  G 
are  assumed  to  have  exponential  distributions.   A  pre- 
liminary survey  indicated  that  a  large  computational 
program  would  be  required  if  F  and  G  are  assumed  to 
have  the  gamma  distributions  defined  at  the  beginning 
of  this  section. 

When  F  and  G  are  assumed  to  be  exponential,  a 
suitable  computation  formula  for  Pj.CnjC)  is  obtained 
as  follows:   First,  let  z'  =  \iz ,  6'  =  ^8,  integrate 
and  combine  terms  to  obtain 

Pl(n'c)  =  <c+l)(2nc+2n+c)  + 

f  f Z 

2nc  I  exp(-cz-z)dz  i  [l-2exp(-cz)  sinh  c§]  # 

-'o  -^o 

ri-2exp(-z)  sinh  6]   cosh  8  d8 

Then  by  interchanging  the  order  of  integration  and 
expanding  both  (Tl-2exp(-cz)  sinh  cS]      and 

■n   1 

fl-2exp(-z)  sinh  8]  "    into  binomial  series,  it  can 
be  shown  that 
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P!(n,c)  = 


(c  +  lX2nc  +  2n+c; 


n 


n-1 


+  nc 


k=0 


f   (-D1 


n-1      1 

j   J  (ck  + j+c  +  1) 


(-1) 


P 


p=0 
=  (2ck+2j-2ci-2p+c) 


Fi  Fi       +2 

k,j,i,p    k,j,i,p 


k,j,i,P 

Since  P1(n,c)  =  Pj,(n,l/c),  Table  3  provides  Px(n,c)  for 

c  =  1,2,3,4,10,20  and  the  reciprocals  for  a  wide  range 
of  values  of  n.   They  by  utilizing  formula  (4)  of 
Section  II,  it  is  possible  to  obtain  a  reasonable  upper 
bound  for  Px(n,c)  as  n  tends  to  infinity.,   To  begin 
with,  it  is  observed  that  Px*  (c),  where  Px*  (c)  is 
defined  as 

00 

t.  +/  -\  _  t  •   ™/    -\  _   /   c  exp(-cz)dz 
Px*(c)  -  Lim  PxCn.c)  -  I  c  expt-xc+z)+l  • 

o 

has  by  Schwartz's  inequality  an  upper  bound  of  -^  .   A 

better  upper  bound  can  be  obtained  for  c  >.  5  and  c  <~  1/5 
by  noting  that 

c  exp(-cx)     <   c  exp(-xox) 
c  exp( -xc+xj+l     c  exp(-xc+x)+l 

for  0  <  x  <  °°   and  c  >  0;   hence,  for  c  >  1,  integra- 
tion yields 


Pi*(c)  < 
therefore , 


c  exp( -xc+x)dx  _  ln(c+l ) 
c  exp(-xc-i-x)+l     (c-1 ) 


17 


n       ±S         \         .        ln(C  +  l  )    r  .    r, 

Pi*(c;  <  —7 — ztt~  c  >  o 

since  it  is  evident  from  formula  (4)  that 
Pi*(c)  =  Px*(l/c)  =  P2*(c)  =  P2*(l/c).   Table  3  contains 
limiting  probabilities,  P*,  which  were  computed  by  numeri- 
cal integration  using  Simpson's  rule. 

The  result  that  the  "rule  of  nearest  neighbor"  will 
have,  as  n  tends  to  infinity,  limiting  probabilities  of 
error  of  at  most  £  is  particularly  interesting  since,  as 
will  be  shown,  no  such  general  statement  can  be  made  for 
the  linear  discriminant  function  when  the  populations  are 
characterized  by  exponential  distributions.   Considering 
now  the  linear  discriminant  function  for  the  case  when 
the  populations,  P  and  G,  are  assumed  to  have  gamma 
distributions,  a  computational  formula  will  be  developed 
for  the  probabilities  of  misclassif ication .   Again,  it 
will  be  assumed  that  the  samples  available  from  each 
population  are  equal.   Since  this  procedure  consists  of 
computing  the  arithmetic  mean,  (X+Y)/2,  of  the  sample 
means  and  assigning  Z  to  that  population  whose  sample 
mean  lies  on  the  side  of  (X+Y)/2  as  does  Z  itself,  the 
error  probability,  Plf  is  committed  if  and  only  if 

Z  >  (X+Y)/2  and  Y  >  X 
or 

Z  <  (X+Y)/2  and  Y  <  X. 
Thus,  by  the  definition  of  Pj.  it  follows  that 
Px  =  P[Z  >  (X+Y)/2,Y  >  X]  +  P[Z  <  (X+Y)/2,  Y  <  X]. 
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For  the  purpose  of  convenience,  it  is  desirable  to 
define  two  new  random  variables,  S  and  T,  where  S  =  nX 
and  T  =  nY.   Let  the  density  functions  of  S  and  T  be 
denoted  by  f(s;nr9cu)  and  g(t;nr,u),  respectively . 
The  probability,  Plv  can  now  be  expressed  more  con- 
veniently, as 

P1(n,c)    P[Z  >  (S+T)/2n,T  >  S7  +  P[Z  <  (S+T)/2n,T  <  S] 

00  -   00  CD 

f(s;nr,cn)ds  I  g(t;nr,u)dt  I  f(z;r,cu)dz 

r™  r s  re***)/** 

+     /  f  (s  ;nr  ,cu)ds   /  g(  t ;  nr  ,[i)dt   If  (z  ;  r  ,  c^i)dz 

As  in  the  "rule  of  nearest  neighbor"  procedure,  it  can 
easily  be  shown  by  the  following  change  of  variables, 
z'  =  cz,  t'  =  ct,  and  s!  =  cs,  that  the  relationship 
between  Px  and  P2  is  again  given  by  P^njC.)  =  P2(n,l/c). 

Since  Pjrijc)  =  P2(n,l/c),  it  is  sufficient  to 
obtain  a  computation  formula  for  P1(n,c).   The  methods 
employed  to  obtain  this  formula  are  now  outlined*  First, 
it  is  observed  that  Px(nvc)  can  be  expressed  as 

f '      r 

Pi(n,c)  =  I  f  (s  ;nr  ,cfi)ds  /  g(t;nr,^)dt 
Jo  Jo 

CO 

+  2  /  f(s;nr,cu)ds  /  g(t;nr,|i)dt    /   f(z;r,C(j)dz 


'o 


'(*+x)/** 


f  (s;nr  ,cji)ds  I  g(  t  ;nr  ,|_i)dt     I  f(z;r,c^i)dz 
f0  Jo  J^  +  t)/^n 
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Now  by  utilizing  the  well  known  integration  by  parts 
formula,  n_i 


1 


x  exp(-ax)dx  =  -exp(-ax) 


it    can   be    shown   that 

nr-1 


k=0 


un_k  r(k+i) 


Pi(n,c)    =    1    -    c 


nr 


(nr+k-1): 


r-1 


kl(nr-l):(c+l) 
k=0  k 


nr  +  k 


2c 


nr  +  r 


[(nr-1)'-]2      /,       (2n)k  cr_k 


(nr+i-1): 
il(k-i7T~ 


nr+i-1 


k=0  i-0 
(nr+k+j-i-1)  I 


•  t  r  1  ^     //■->     Mnr  +  1_J    r -i  ^  /   -inr  +  k  +  j-i 

jlLl  +  c/(2n)J  J    Ll  +  c+c/n] 


j=0 
nr  +  r 


r-1 


[(nr-1):]2    /     . (2n)k   cr" 


k=0 


i  =  0 


(nr+i-l)I(nr+k-i-l): 


il(k-i):[l+c/(2n)]nr+1    [c+c/(2n)]nr+k-1 
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Table  4  provides  a  tabulation  of  the  probabilities  of 
misclassif ication,  PiCrijC)  =  P2(n,l/c),  for  r  equals 
1  through  20,  c=l,  2,  3,  4,  5,  10,  20  and  the  recip- 
rocals, and  a  fairly  wide  range  of  values  for  n. 

The  probabilities  of  misclassif ication  for  the 
linear  discriminant  function  were  also  examined  when 
unequal  samples  were  available  from  the  populations 
F  and  G  for  the  special  case  when  r  =  1.   Using  tech- 
niques analogous  to  those  described  in  the  preceding 
paragraph,  it  is  observed  that  for  samples  of  size  n 
and  m  from  the  populations  described  by  the  distribu- 
tions of  F  and  G  respectively,  the  relationship 
between  ¥t    and  P2  is 

Px( j=n,i=ra,c)    =    P2 ( j=m,i=n, 1/c) 
where 

Pj.(  j=n,i=m,c)    =    1 


[l+l/(2j)}J    Cl+c/(2i)]1 
i-1 


Cl+i/(jc)]J       Z_,    1      k     /  [l  +  (jc)/i]: 
k=0  i-1 

2o) \       [  j+k-l\  (l  +  c/2)k 


[l+c/(2i)l1    [c/j  +  c  +  i/jlJ      /_\      k     /(jc+c  +  i)k 

k=0 

Although  a  tabulation  of  the  error  probabilities,  Px 
and  P2 ,  when  the  sample  size  is  not  equal,  would  be  of 
some  value  and  interest,  time  limitations  precluded 
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the  computation  of  a  table  which  would  enumerate  these 
probabilities . 

In  the  special  case  of  r  =  1 ,  it  was  possible  to 
determine  the  limiting  probabilities  of  misc lassif ica- 
tion.   The  procedure  for  obtaining  the  limiting  proba- 
bilities is  briefly  outlined.   When  r  =  1,  the  distri- 
butions F  and  G  are  exponential,  and  Px  can  be  expressed 
as  oo  g 

Pi(n,c)  =  l/q(n,c)  +  I  f  (s  ;n,  cp.)ds  I  g(t  ;n,{i)dt 

-x>  Jo 

r°°  s 

-2  £  (s;  n,C|a)exp[-cns/(2n)]ds  /  g(  t  ;n,^)exp[  -C(_it/(2n)  ]dt 
-o  Jo 

which   by   the    change    of    variables,    s'    =    c\a(  2n+l)s/(  2n) 
and    t'    =   |j(  2n+c  )t/(2n)    for    the    integral    appearing    first 
in    the    above    expression    for   Pi(n,c)    and    t'    =   ^t    and 
s'    =    C[is    for    the    second    integral,    yields 

r00  rs/c 

Pi(n,c)    =    l/q(n,c)    +       /f(s;n,l)ds     /    g(t;n,l)dt 

Jo  Jo 

r00  rh(s) 

-    2/q(n,c)     /f(s;n,l)ds     /    g(t    ;n,l)dt 


where 

h(s)    =    (2n+c)s/(2nc+c) 
and 

q(n,c)    =    [l+l/(2n)]n[l+c/(2n)]n 
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Now,  if  the  simple  one-one  transformation, 

x  =  s/(s+t) 

y  =  s  +  t 
is  utilized,  the  above  expression  for  P1(n,c)  becomes 
PjL(n,c)  =  l/q(n,c) 


»  /  y2n~1exp(-y)dy  /  xn_1 ( l-x)n_1dx 
/o  J  c/c+1) 

-  2/[q(n,c)r2(n)]  /  y2n_:Lexp( -y)dy  /  xn_1(  1-x  )n_1dx, 

t(n,c) 
which  upon  integrating  out  y,  can  be  expressed  as 

(8).   PiCrijC)  =  l/q(n,c)  +  l/B(n,n)  /  xn_1  ( l-x)n"  1dx 

c/(c+l) 

1 

xn_1  (l-x)11"1  dx 
(n,c) 
where 

t(n,c)  =  (2nc  +  c)/(2n+2c  +  2nc) 
and 

B(n,n)  =  T2(n)/r(2n). 
Since  it  is  evident  when  c  =  1  that  P^iijc)  =  •§-  for 
all  n,  it  remains  only  to  consider  the  cases,  0  <  c  <  1 
and  c  >  1.   By  considering  each  case  separately  and 
applying  Chebyshev's  inequality  to  formula  (8),  the 
limiting  probability  of  Px(n,c)  is 
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1  -  exp[-(c+l)/2],  if  0<c<l 
(9).  P1*(n,c)  =  Lim  Pi(n,c)=<j£,  if  c  =  1 

exp[-(c+l)/2],  if  c  >  1 
As  mentioned  previously,  the  limiting  probabilities  for 
the  nonparametric  discriminator,  "rule  of  nearest 
neighbor,"  are  at  most  -§-,  but  from  formula  (9)  it  is 
apparent  that  the  limiting  probabilities  for  the  linear 
discriminant  function  are  greater  than  ^   for 
[2(ln  2)-l]  <  c  <  1. 
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TABLE  3 


ERRCR  PROBABILITIES 
RULE  OF  NEAREST  NEIGHBOR 
EXPONENTIAL  POPULATIONS 


n\c 


1.0 


.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.4000 

.3262 

.2741 

.2359 

.1385 

.0757 

2 

.5000 

.4222 

.3560 

.3067 

.2693 

.1676 

.0957 

3 

.5000 

.U317 

.3691 

.3215 

.2850 

.1831 

.1077 

U 

.5000 

.4368 

.3761 

.3297 

.2938 

.  192U 

.1155 

5 

.5000 

.4399 

.380U 

.3347 

.2992 

.1985 

.1209 

6 

.5000 

.4419 

.3833 

.3380 

.3029 

.2027 

.1248 

7 

.5000 

.4434 

.3853 

.3U04 

.3055 

.2057 

.1278 

8 

.5000 

.4445 

.3868 

.3421 

.307U 

.2080 

.1301 

9 

.5000 

.4453 

.3879 

.3435 

.3089 

.2098 

.1319 

10 

.5000 

.4460 

.3888 

.34M5 

•  3100 

.2112 

.1333 

15 

.5000 

.4478 

.3913 

.3475 

.3131* 

.2152 

.1377 

20 

.5000 

.4487 

.3925 

.3489 

.3149 

.2171 

.1398 

00 

.5000 

.U507 

.3954 

.3524 

.3188 

.2217 

.1447 
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TABLE  3 


K 


ERROR  PROBABILir IES 
RULE  OF  NEAREST  NEIGHBOR 
EXPONENTIAL  POPULATIONS 


.  ' 


ft* 


n\c 


.5000 


.3333 


.2500    .2000 


.1000    .0500 


1 

.5333 

.5214 

.5037 

.4870 

.4329 

.3907 

2 

.5003 

.4666 

.4340 

.4068 

.3277 

.2714 

3 

.4856 

.4426 

.4043 

.3733 

.2858 

.2239 

U 

.4773 

.4294 

.3884 

.3558 

.2647 

.  .1997 

5 

.4719 

.4212 

.3788 

.3455 

.2527 

.1854 

6 

.4682 

.4157 

.3725 

.  3389 

.2451 

.1761 

7 

.4655 

.4118 

.3683 

.3345 

.2400 

.1698 

8 

.4634 

.4089 

.3652 

.3313 

.236U 

.1652 

9 

•  4618 

.4067 

.3629 

.3290 

.2338 

.1617 

10 

.4605 

.405.0 

.3612 

.3273 

.2319 

.1592 

20 

.4547 

.3984 

.3549 

.3212 

.2247 

.  1492 

00 

.4507 

.3954 

.3524 

.3188 

.2217 

.1447 
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TABLE  U 


LINEAR  CISCRIMIMANT  FUNCTION 
tRRCR  PROBA3 ILITIES 
FOR  GAMMA  POPULATIONS 


R  =   1 
N\C     1.0      2.0      3.0      U.O      5.0     10.0     2C.C 


1 

.5000 

.1+000 

.3262 

.  2741 

.2359 

.  1385 

.0757 

2 

.5000 

.3637 

.2652 

.2006 

.1567 

.0627 

.0209 

3  . 

.5000 

.3391 

.2292 

.1619 

.1188 

.0366 

.0083 

4 

.5000 

.3207 

.2058 

.  1393 

.0984 

.0254 

.0043 

5 

.5000 

.3062 

.  1898 

.  1252 

.0864 

.0197 

.0026 

6 

.5000 

.2945 

.1784 

.  1  161 

.0790 

.0164 

.0017 

7 

.5000 

.2848 

.1702 

.  1099 

.0741 

.0142 

.0012 

8 

.5000 

.2768 

.1642 

.1056 

.0706 

.0127 

.0009 

9 

.5000 

■  .2701 

.1597 

.  1024 

.0681 

.0115 

.0007 

10 

.5000 

.2643 

.1562 

.  1000 

.0661 

.0106 

.0006 

15 

.5000 

.2460 

.1473 

.0936 

.0606 

.0082 

.0003 

20 

.5000 

.2369 

.1439 

.  0907 

.0579 

.0070 

.0002 

25 

.5000 

.2322 

.1421 

.0890 

.0563 

.0064 

.0001 

30 

.5000 

.2296 

.1409 

.  0879 

.0552 

.0060 

.0001 

35 

.5000 

.2280 

.1401 

.  0870 

.0544 

.0057 

.0001 

40 

.5000 

.2271 

.1395 

.  0864 

.0538 

.0055 

.0001 

50 

.5000 

.2260 

.1387 

.  0856 

.0530 

.0052 

.0001 

60 

.5000 

.2255 

.1381 

.0850 

.0525 

.0050 

.0001 

70 

.5000 

.2251 

.1377 

.0846 

.0521 

.0049 

.00C1 

80 

.5000 

.2249 

.1374 

.0843 

.0518 

.0043 

.0000 

90 

.5000 

.2247 

.1372 

.  0840 

.0516 

.0047 

.0000 

100 

.5000 

.2245 

.1370 

.  0830 

.0514 

•  0046 

.0000 

00 

.5000 

.2231 

.1353 

.  0821 

.0498 

.0041 

.occc 
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T/SBLE    4 


LINEAR    CISCPIMINANT    FUNCTION 
ERROR    PFOBA8ILITIES 
FOR    GANP*    PCPULATICNS 


R    =       1 


n\c 


5C00 


.3333 


.2500         .2CCC 


.  1COC 


.C5C0 


1 

.5333 

.5214 

.5037  . 

.4870 

.4329 

.3907 

2 

.5299 

.5041 

.4782 

.4577 

.4076 

.3813 

3 

.5278 

.4947 

.4666 

.4469 

.4053 

.3866 

4 

.5265 

.4893 

.4613 

.4429 

.4072 

.3912 

5 

.5256 

.4860 

.4588 

.4419 

.4096 

.3944 

6 

.5251 

.4841 

.4578 

.4  ill  9 

.4115 

.3966 

7 

.5247 

.4829 

.4576 

.4425 

.4131 

.3982 

8 

.5245 

.4823 

.4577 

.4431 

.4142 

.3995 

9 

.5244 

.4819 

.4580 

.4428 

.4152 

.4004 

10 

(^5243 

.4818 

.4584 

.4U44 

.4160 

.4012 

15 

.5244 

.4823 

.46C1 

.4465 

.4183 

.4036 

20 

.5247 

.4831 

.4612 

.4477 

.4195 

.4048 

25 

.525C 

.4838 

.4619 

.4484 

.4202 

.4055 

30 

.5254 

.4842 

.4624 

.4488 

.4206 

.4060 

35 

.5256 

.4846 

.4627 

.4492 

.4210 

.4063 

40 

.525e 

.4848 

.4630 

.4494 

.4212 

.4066 

50 

.5262 

.4852 

.4623 

.4498 

.4216 

.4070 

60 

.5264 

.4854 

.4636 

.45CC 

.4216 

.4  072 

70 

.5266 

.4856 

.4637 

.4502 

•  4220 

.4074 

80 

.5267 

.4857 

.4639 

.4503 

.4221 

.4075 

90 

.5268 

.4858. 

.4640 

.4504 

.4222 

.4076 

100 

.5269- 

.4859 

•  4640 

.4505 

.4222 

.4077 

00 

.5277 

.4833 

.4647 

.4512 

.4231 

.4084 
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TABLE  4 


LINEAR  CISCRIMIMANT  FUNCTION 
ERROR  PROBABILITIES 
FCR  GAMMA  POPULATIONS 


R  =   2 


n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.3598 

.2532 

.  1851 

.1404 

.0512 

.0158 

2 

.5000 

.3127 

.  1839 

.  1  133 

.0733 

.0138 

.0017 

3 

.5000 

.2836 

.1508 

.0850 

.0505 

.0063 

.0004 

4 

.5000 

.2639 

.1329 

.0717 

.0406 

.0033 

.0001 

5 

.5000 

.2498 

.1226 

.0644 

.0353 

.0026 

.0001 

6 

.5000 

.2396 

.1162 

.0600 

.0320 

.0020 

.occo 

7 

.5000 

.2319 

.1120 

.0571 

.0298 

.0016 

.0000 

8 

.5000 

.2261 

.1090 

.0549 

.0281 

.0013 

.oocc 

9 

.5000 

.2217 

.1069 

.0533 

.0269 

.0011 

.0000 

10 

.5000 

.2182 

.1052 

.0520 

.0259 

.0010 

.occo 

15 

.5000 

.2092 

.1006 

.0481 

.0229 

.0006 

.0000 

20 

.5000 

.2058 

.0984 

.  0462 

>  .0215 

.0005 

.ooco 

25 

.5000 

.2042 

.0970 

.0450 

.0207 

.0004 

>■ 

.0000 

30 

.5000 

.2033 

.0961 

.0443 

.0201 

.0004 

.0000 

35 

.5000 

.2027 

.0955 

.0437 

.0197 

.0003 

.0000 

40 

.5000 

.2022 

.0950 

.0433 

.0194 

.0003 

.occo 

50 

.5000 

.2016 

.0943 

.0427 

.0190 

.0003 

.0000 

60 

.5000 

.2012 

.0939 

.  0*423 

.0187 

.0003 

.0000 

70 

.5000 

.2009 

.0935 

.0421 

.0185 

.0003 

.0000 

80 

.5000 

.2007 

.0933 

.0419 

.0184 

.0003 

.ocoo 

90 

.5000 

.2005 

.0931 

.0417 

.0183 

.0003 

.0000 

00 

.5000 

.2004 

.0929 

.0U16 

.0162 

.0002 

.0000 

29 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILIT IES 
FOR  GAMMA  POPULATIONS 


R  =   2 


n\c 


.5000 


.3333 


.2500 


.  2000 


.1000 


.0500 


1 

.4999 

.4487 

.4071 

.3770 

.3109 

.2807 

2 

.4782 

.4102 

.3678 

.  3423 

.2976 

.2794 

3 

.4658 

.3946 

.3566 

.3354 

.2979 

.2806 

4 

.4580 

.3877 

.3533 

.  3343 

.2986 

.2812 

.4527 

.3846 

.3526 

.  3344 

.2991 

.2815 

6 

.4491 

.3832 

.3526 

..  3348 

.2994 

.2817 

7 

.4466 

.3827 

"  .3528 

.  3351 

.2997 

.2819 

8 

.4448 

.3826 

.3530 

.  3354 

.2999 

' .2820 

9 

.4435 

.3826 

.3533 

.3356 

.3000 

.2821 

10 

.4426  • 

.3827 

.3535 

.  3358 

.3001 

.2821 

15 

.4409 

.3833 

.3541 

.3363 

.3004 

.2823 

20 

.4407 

.3837 

.3544 

.  3366 

.3005 

-282U 

25 

.4409 

.3840 

.3546 

.3367 

.3006 

.2824 

30 

.4410 

.3841 

.3547 

.  3368 

.3007 

.2825 

35 

.4412 

.3842 

.3548 

.3369 

.3007 

.2825 

40 

.4413 

.3843 

.3549 

.  3370 

.3008 

.2825 

50 

.4415 

.3845 

.3550 

.3371 

.3008 

.2825 

60 

.4416 

.3845 

.3550 

.3371 

.3008 

.2826 

70 

.4417 

.3846 

.3551 

.3371 

.3008 

.2826 

80 

.4417 

.3846 

.3551 

.3372 

.3009 

.2826 

90 

.4418 

.3847 

.3552 

^3372 

.3009 

.2826 

100 

.4418 

.3847 

.3552 

.3372 

.3009 

.2626 

30 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


R  *   3 


r\c 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0 


20.0 


1 

.5000 

.3266 

.1998 

.  1278 

.0859 

.0197 

.0035 

2 

.5000 

.2719 

.1317 

.0675 

.0368 

.0034 

.0002 

3 

.5000 

.2412 

.1046 

.0485 

.0237 

.0012 

.0000 

U 

.5000 

.2223 

.0918 

.0403 

,  .0184 

.0006 

.0000 

5 

.5000 

.2100 

.0849 

.0359 

.0156 

.0004 

.0000 

6 

.5000 

.2018 

.0807 

.0331 

.0138 

.0003 

.0000 

7 

.5000 

.1961 

.0779 

.0312 

.0126 

.0002 

.0000 

8 

.5000 

.1920 

.0758 

.0298 

.0117 

.0001 

.0000 

9 

.5000 

.1891 

.0743 

.0287 

.0110 

.0001 

.0000 

10 

.5000 

.1869 

.0730 

.0278 

.0105 

.0001 

.0000 

15 

.5000 

.1815 

.0694 

.0252 

.0090 

.0001 

.0000 

20 

.5000 

.179U 

.0675 

.0240 

.0083 

.0000 

.0000 

25 

.5000 

.1782 

.0664 

.0232 

.0078 

.0000 

.0000 

30 

.5000 

.1774 

.0657 

.0227 

.0076 

.0000* 

.0000 

35 

.5000 

.1769 

.0651 

•  0224 

.0074 

.0000 

.0000 

40 

.5000 

.  .1765 

.0648 

.0221 

.0072 

.0000 

.0000 

50 

.5000 

.1759 

.0642 

.0217 

.0070 

.0000 

.0000 

60 

.5000 

.1755 

.0638 

.0215 

.0069 

.0000 

.0000 

70 

.5000 

.1752 

.0636 

.0213 

.0068 

.0000 

.0000 

80 

.5000 

.1750 

.0634 

•  0212 

.0067 

•  0000 

.0000 

90 

.5000 

.1749 

•  0632 

•  0211 

.0067 

•  0000 

•  0000 

100 

.5000 

.1747 

.0631 

.0210 

•  0066 

•  0000 

.0000 

31 


n\c 


TABLE  l* 

LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULAT  ICNS 

R  *   3 

.5C00    .3333    .25CO    .2000    .1000    .05C0 


1 

.4660 

.3876 

.3361 

.3CU1 

.21*71 

•  2260 

2 

.1*326 

.341*3 

.3001* 

.2770 

.2371* 

.2199 

3 

.415M 

".3307 

.2931 

.2728 

.2353 

.2173 

k 

.1*056 

.3260 

.2913 

.2718 

.231*1 

.2158 

5 

.3997 

.32M2 

.29C8 

.2713 

.2333 

.21U8 

6 

.3960 

.3236 

.2905 

.2710 

•  2328 

.211*1 

7 

.3937 

.3234 

.290U 

.2708 

•  2321* 

.2136 

8 

.3923 

.3233 

.2903 

.27C7 

.2321 

.2132 

9 

.3913 

.3233 

.2902 

.2705 

.2318 

.2129 

10 

.3908 

.3233 

.2902 

.270** 

.2316 

.2126 

15 

.3899 

.3233 

.29C0 

.2701 

.2310 

.2119 

20 

.3900 

.3233 

.28*39 

.2699 

.2307 

.2115 

25 

.3901 

.3233 

.2898 

.2698 

..2305 

.2112 

30 

.3902, 

.3233 

.2898 

.2698 

.2303 

.2111 

35 

.3903 

.3233 

.2897 

.2697 

.2302 

.2109 

uo 

.3903 

.3233 

.2897 

.2697 

.2302 

.2108 

50 

•  390M 

.3233 

.2897 

•  2696 

.2301 

.2107 

60 

.390*4 

.3233 

.2897 

.2696 

.2300 

.2106 

70 

.3905 

.3233 

.2896 

.2695 

.2299 

.2106 

80 

.3905 

.3233 

.2896 

.2695 

.2299 

.2105 

90 

.3905 

•  3233 

.2896 

•  2695 

.2299 

.2105 

100 

.3905 

.3233 

•  2896 

•  2695 

•  2298 

•  2105 

32 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.2974 

.1588 

.0892 

.0533 

.0078 

.0008 

2 

.5000 

.2379 

.0964 

.0417 

.0193 

.0009 

.0000 

3 

.5000 

.2076 

.0750 

.0289 

.0117 

.0003 

.0000 

4 

.5000 

.1904 

.0656 

.0235 

.0087 

•  0001 

.0000 

5 

.5000 

.1801 

.0606 

.0206 

.0071 

•  0001 

.0000 

6 

.5000 

.1736 

.0574 

.0187 

.0061 

.0000 

.0000 

7 

.5000 

.1693 

.0552 

.0174 

.0055 

.0000 

.0000 

8 

.5000 

.1663 

.0536 

.0165 

.0050 

.0000 

•  0000 

9 

.5000 

.1642 

.0524 

.0158 

•  0046 

.0000 

.0000 

10 

.5000 

•  1626 

.0514 

.0152 

.0044 

.0000 

.0000 

15 

.5000 

.1586 

.0484 

.0135 

.0036 

.0000 

.0000 

20 

.5000 

.1567 

.0469 

.0127 

.0032 

.0000 

.0000 

25 

.50.00 

, .1556 

.0460 

.0122 

.0030 

.0000, 

.0000 

30 

.5000 

.1549 

.0454 

.0119 

.0029 

.0000 

.0000 

35 

.5000 

.1544 

.0449 

.0117 

.0028 

.0000 

.0000 

40 

.5000 

.1540 

.0446 

.0115 

.0027 

.0000 

•  0000 

50 

.5000 

.1534 

.0442 

•  0113 

.0027 

.0000 

•  0000 

60 

•  5000 

.1531 

.0439 

•  0111 

.0026 

•  0000 

.0000 

33 


T/SBLE  4. 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULAT ICNS 


R  =   4 

n\c 

.5C0C 

.3333 

.25C0 

.2000 

.1000 

.0500 

1 

.4345 

.3383 

.28M4 

.251*3 

.2062 

.1885 

2 

.3937 

.2967 

.25M7 

.2330 

.1951 

.1777 

3 

.3748 

.2861 

.2493 

.2290 

.1910 

.1730 

4 

.3650 

.2828 

.2U75 

.2273 

-  .1887 

.1704 

5 

.3597 

.2815 

.2465 

.2262 

.1872 

.1688 

6 

.3568 

.2809 

.2M59 

.225U 

.1862 

.1676 

7 

.3551 

.2806 

.2454 

.22M9 

.1855 

.1668 

8 

.351*1 

.2803 

.2M51 

.221*5 

.1849 

.1661 

9 

.3536 

.2802 

.2448 

.22M1 

.181*5 

.1656 

TO 

.3532 

.2800 

.2Ui*6 

.2239 

.18U1 

.1652 

15 

.3528 

.2795 

.21*39 

.2230 

.1830 

.1640 

20 

.3528 

.2793 

.2435 

.2226 

.1821* 

.1633 

25 

.3528 

.2792 

.21*33 

.2223 

.1821 

.1630 

30 

.3528 

.2791 

.2U32 

•  2222 

.1818 

.1627   > 

35 

.3528 

.2790 

.2M21 

.2220 

.1817 

.1625 

40 

.3528 

.2789 

.21*30 

.2219 

.1815 

.1624 

50 

.3528 

.2789 

.21*29 

.2218 

•  1811* 

•  1622 

60 

•  3528 

.2788 

.2U28 

.2217 

.1812 

.1620 

34 


TABLE  4 


LINEAR  CISCRIMIMANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


<■  ' 


R  =   5 


n\c 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0 


20.0 


1 

.5000 

.2714 

.1269 

.0629 

.0335 

.0031 

.0002 

2 

.5000 

.2093 

.0718 

.0264 

.0105 

.0002 

.0000 

5 

.5000 

.1565 

.0439 

.0120 

.0033 

.0000 

.0000 

10 

.5000 

.1426 

.0365 

.0084 

.0018 

.0000 

.0000 

R    * 

'■      6 

n\c 

1.0 

2.0 

3.0 

4.0 

5.0 

10.0 

20.0 

1 

.5000 

.2480 

.1019 

.0446 

.0213 

.0013 

.OOCC 

2 

.5000 

.1850 

.0542 

.0170 

..0058 

.0001 

.0000 

5 

.5000 

.1374 

.0321 

.0071 

.0015 

.0000 

.0000 

10 

.5000 

.1256 

*0261 

.0047 

.0008 

.0000 

.0000 

35 


TABLE  4 

LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILU  IES 
FOR  GAMMA  POPULATIONS 

R  =   5 

N\C       .5000  .3333         .2500         .2000         .1000         .0500 


1 

.U057 

.2984 

.2455 

.2181 

.1759 

.1596 

2 

.3605 

.2607 

.2208 

.2000 

.1627 

.1458 

5 

.3283 

.2482 

.2121 

.1914 

.1527 

.  1349 

10 

.3236 

.2459 

.2093 

.  1882 

.  1488 

.1307 

50 

.3227 

.2440 

R  =   6 

n\c 

.5000 

.3333 

.2500 

.  2000 

.  1000 

.0500 

1 

.3794 

.2658 

.2153 

.  1904 

.1519 

.1363 

2 

.3320 

.2321 

.1938 

.  1735 

.1371 

.1208 

5 

.3026 

.2209 

.1843 

.  1637 

.  1259 

.1090 

10 

.2988 

.2180 

.1808 

.  1599 

.1217 

.1045 

36 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


N\£ 


1.0 


R  =   7 


2.0 


3.0 


4.0 


5.0 


1 

.5000 

.2268 

.0821 

.0319 

.0136 

2 

.5000 

.1642 

.0414 

.0111 

.0032 

5 

.5000 

.1214 

.0237 

.0042 

.0007 

10 

.5000 

.1111 

•  0188 

.  0026 

.0003 

10.0 

20.0 

.0005 

.0000 

.0000 

.0000 

.0000 

•  0000 

.0000 

•  0000 

R  - 

8 

- 

N\C 

1.0 

2.0 

3.0 

4.0 

5.0 

10.0 

20.0 

1 

.5000 

.2077 

.0664 

.0230 

.0088 

.0002 

.0000 

2 

.5000 

.1463 

.0319 

.0074 

.0018 

.0000 

.0000 

5 

.5000 

•  1078 

.0175 

.0025 

.0003 

.0000 

.0000 

0 

•  5000 

.0985 

.0136 

.0015 

•  0001 

.0000 

.0000 

37 


TABLE  U 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILIT IES 
FOR  GAMMA  POPULATIONS 


R  =   7 


N\C 

.5000 

.3333 

.2500 

.2000 

.1000 

.0503 

1 

.3555 

.2387 

.1911 

.  1682 

.1322 

.1172 

2 

.307  3 

.2087 

.1715 

.  1517 

.1163 

.1008 

5 

.2809 

.1979 

.1612 

.  li»09 

.1046 

.0888 

10 

.2775 

.1945 

.1574 

.  1369N 

.1003 

.0843 

R  =   8 


N\C      .5000         .3333         .2500        .2000 


.1000 


.0500 


1 

.3336 

.2159 

.1711 

.  1497 

.1155 

.1012 

2 

.2857 

.1890 

.1526 

.  1333 

.0992 

.0846 

5 

•  2621 

.1781 

.1U17 

.  1220 

.0874 

.0727 

10 

.2588 

•  17UU 

.1377 

.  1178 

•  0631 

.0684 

TABLE  4 

LINEAR  CISCRIMIMANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 

R  -   9 


N\C 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.1904 

.0540 

.0166 

.0057 

.0001 

.0000 

2 

.5000 

.1308 

.0247 

.0049 

.0010 

.0000 

.0000 

5 

.5000 

.0961 

.0130 

.0015 

.0002 

.0000 

.0000 

10 

.5000 

.0875 

.0099 

.0009 

.0001 

.0000 

.0000 

R  =  10 


N\C 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0 


20.0 


1 

.5000 

.1747 

.0440 

.0121 

.0037 

•  0000 

.0000 

2 

.5000 

.1174 

.0193 

.0033 

•  0006 

.0000 

.0000 

5 

.5000 

.0859 

.0097 

.0009 

•  0001 

.0000 

•  0000 

39 


TABLE.  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILIF IES 
FOR  GAMMA  POPULATIONS 


R  =   9 

n\c 

.5000 

.3333 

.2500 

.2000 

.1000 

.0500 

1 

.3137 

.1965 

.1542 

.  1341 

.1014 

.0877 

2 

.2667 

.1720 

.1364 

.  1175 

.0850 

.0713 

5 

.2454 

.1609 

.1251 

.  1060 

.0734 

.0599 

10 

.2421 

R  =  10 


n\c 

.5000 

.3333 

.2500 

.2000 

.1000 

.0500 

1 

.2954 

.1798 

.1397 

.  1206 

.0893 

.0762 

2 

.2498 

.1571 

.1223 

.  1040 

.0730 

.0603 

5 

.2305 

.1458 

.1108 

.0925 

.0619 

.0495 

10 

.2270 

.1417 

.1066 

.0883 

.0579 

.0457 

40 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


R  =  11 


N\C 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.1605 

.0359 

.0088 

.0025 

.0000 

.0000 

2 

.5000 

.1056 

.0151 

.0022 

.0003 

.0000 

.0000 

5 

.5000 

.0770 

.0073 

.0005 

.0000 

.0000 

.0000 

10 

.5000 

.0694 

.0052 

.0003 

.0000 

.0000 

.0000 

R  = 

12 

** 

N\C 

1.0 

2.0 

3.0 

4.0 

5.0 

10.0 

20.0 

1 

.5000 

.1475 

.0294 

.0065 

.0016 

.0000 

.0000 

2 

.5000 

.0953 

.0119 

.0015 

.0002 

.0000 

.0000 

5 

.5000 

.0691 

.0054 

.0003 

.0000 

•  0000 

.0000 

41 


TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILIT IES 
FOR  GAMMA  POPULATIONS 


ft 


R  =  11 


N\C   .5000 


.3333 


.2500    .2000 


.1000 


.0500 


1 

.2787 

.1652 

.1270 

.  1087 

.0788 

.0664 

2 

.2348 

.1439 

.1099 

.0923 

.0630 

.0511 

5 

.2170 

.1324 

.098U 

.0809 

.0523 

.0410 

R  =  12 


N\C 

.5000 

.3333 

.2500 

.2000 

.1000 

.0500 

1 

.2633 

.1523 

.1159 

.0983 

.0696 

.0580 

2 

.2212 

.1321 

.0990 

.0821 

.0544 

.0435 

5 

.2046 

.1205 

.0876 

.0710 

.1443 

.0341 

10 

.2009 

•  1163 

.0835 

.0670 

.0408 

.0309 

42 


TABLE  4 


LINEAR  OISCRIMHANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


R  ' 

13 

N\C 

1.0 

2.0 

3.0 

4.0 

5.0 

10.0 

20.0 

1 

.5000 

.1358 

.0242 

.0048 

.0011 

.0000 

.0000 

2 

.5000 

.0861 

.0094 

.0010 

.0001 

.0000 

.0000 

5 

.5000 

.0621 

.0041 

.0002 

•  0000 

.0000 

.0000 

10 

.5000 

.0554 

.0028 

.0001 

.0000 

.0000 

.0000 

R  *  1  4 


N\C 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 

.5000 

.1250 

.0199 

.0035 

.0007 

.0000 

.0000 

2 

.5000 

.0781 

.0074 

.0007 

.0001 

.0000 

.0000 

10 

.5000 

.0496 

.0021 

.0001 

.0000 

.0000 

•  0000 
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TABLE    U 


LINEAR   DISCRIMINANT    FUNCTION 
ERROR    PROBABILIT IES 
FOR    GAMMA    POPULATIONS 


fi 


R    =    13 


N\C      .5000        .3333        .2500        .2000 


.1000        .0500 


1  .2492         .1U09         .1060        .0891         .0617         .0508 

2  .2089         .1215         .0894        .0731         .0471         .0371 


R    =    14 


N\C      .5000         .3333         .2500        .2000         .1000 


.0500 


1 

.2361 

.1307 

.0971 

.  08.08 

.05U7 

.01*45 

2 

.1977 

.1119 

.0808 

.0653 

.0408 

.0316 

5 

.1829 

.1004 

.0698 

.05U9 

.0320 

.0237 

10 

.1789 

.0962 
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TABLE  «♦ 


LINEAR  CISCRIMIMANT  FUNCTION 
ERRCR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


R  =  1  5 


r 


N\C 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0 


20.0 


1 

.5000 

.1152 

.0165 

.0026 

.0005 

.0000 

.0000 

2 

.5000 

.0709 

.0059 

.0005 

.0000 

.0000 

.0000 

0 

.5000 

.0504 

.0023 

•  0001 

.0000 

.0000 

•  0000 

R  =  16 


N\C 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0 


20.0 


1 

.5000 

.1063 

.0136 

.0020 

•  0003 

.0000 

•  OOCO 

2 

.5000 

•  06U5 

•  00U7 

.0003 

.0000 

.0000 

.0000 
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TABLE    4 


n\c 


1 
2 


LINEAR    DISCRIMINANT    FUNCTION 
ERROR    PROBABILIT IES 
FOR    GAMMA    POPULATIONS 


.5000 

.2241 
.1875 


.3333 


/ 


R    =    15 


.2500        .2000         .1000         .0500 


.1214         .0891         .0734         .0486         .0391 
.1033         .0731         .0583         .0355         .0271 


n\c 


.5000 

.2129 
.1780 


.3333 


R    =    16 


.2500 


2000         .1000 


.0500 


.1131  .0819        .0668         .0433         .0343 

.0954         .0663        .0522         .0309         .0232 
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TABLE  h 


LINEAR  CISCRIMINANT  FUNCTION 
ERRCR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


n\c 


1.0 


2.0 


R  =  17 


3.0 


4.0 


5.0 


10.0 


/\ 


20.0 


1 

.5000 

.0981 

.0113 

.0015 

.0002 

.0000 

.0000 

2 

.5000 

.0587 

.0037 

.0002 

.OOOC 

.   .0000 

.ooco 

R  =  18 


n\c 


1.0 


2.0 


3.0 


U.O 


5.0 


10.0. 


20. 0 


1 

.5000 

.0906 

.0094 

.0011 

.0001 

.0000 

vOCCO 

2 

.5000 

.0536 

.0029 

.0001 

.0000 

.0000 

•  0000 
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TABLE  4 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


R  =  17 


N\C      .5000         .3333       ..2500        .  20C0 


.1000 


, 


.0500 


1 

.2026 

.105U 

.0753 

.0609 

.0385 

.0302 

2 

.1693 

.0882 

.0601 

.0468 

.0269 

.0199 

n\c 


.5000 

.1929 
.1612 


3333 


R    =    18 


2500        .2000 


1000 


.0500. 


.0985         .0693        .0555         .0344         .0266 
.0816         .0546        .0419         .0234         »0171 
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TABLE  4 


n\< 


1.0 


1  .5000 

2  .5000 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


2.0 


R  *  19 


3.0 


U.O 


5.0 


.0838    .0078    .0008    .0001 
.0U89    .0023    .0001    .0000 


10.0 


20.0 


.0000    .0000 
.0000    .0000 


n\c 


1.0 


1  .5000 

2  .5000 


2.0 

.0775 
.0UU8 


R  =  20 


3.0 


HuO 


0065   *0006 
0019    .0001 


5.0 


10.0 


20.0 


.0001    .0000    .0000 
.0000    .0000    .0000 
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TABLE  U 


n\c 


LINEAR  DISCRIMINANT  FUNCTION 
ERROR  PROBABILITIES 
FOR  GAMMA  POPULATIONS 


.5000 


/ 


3333 


R    =    19 


,2500        .2000         .1000         .0500 


(  :■ 


1 

.18U0 

.0920 

.0639 

.0506 

.0307 

.0235 

2 

.1536 

.0756 

.0U96 

.0376 

.0205 

.01U7 

n\c      .5 


000 


.3333 


R    =    20 


.2500        .2000 


1000 


0500. 


1 

.1756 

.0861 

.0590 

•  0U62 

o027U 

.0207 

2 

.U65 

.0701 

.01*52 

.0338 

.0179 

•  0126 

1 
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SECTION  IV 
SUMMARY  AND  CONCLUSION 

Section  II  of  this  paper  briefly  summarizes  some 
of  the  work  accomplished  by  Hodges  and  Fix  in  [3]. 
Their  investigation  was  concerned  with  the  computation 
of  the  probabilities  of  misclassif ication  for  various 
nonparametric  procedures  assuming  some  parametric  form 
of  the  distribution  which  describes  the  populations.  The 
error  probabilities  for  the  "optimum"  parametric  procedure 
were  also  computed  and  compared  with  the  nonparametric 
error  probabilities.   The  investigation  considered  the 
two  population  classification  problem  when  the  popula- 
tions have  normal  distributions  with  equal  covariance 
matrices.   The  parametric  procedure  employed  was  the 
linear  discriminant  function  which  is  the  appropriate 
method  in  this  situation,  and  the  primary  nonparametric 
procedure  considered  was  the  "rule  of  the  nearest  neigh- 
bor." The  above  two  procedures  were  compared  by  computing 
the  probabilities  of  misclassif ication.   The  results  of 
this  investigation  indicated  that  the  "rule  of  nearest 
neighbor"  gave  "reasonable"  error  probabilities. 

Section  III  also  considers  the  two  population 
classification  problem,  but  the  investigation  is  primarily 
concerned  with  the  performance  of  the  linear  discriminant 
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function  if  the  actual  densities  which  describe  the 
populations  are  not  normal,  but  in  fact  gamma  with 
density  functions  defined  by  formulas  (5)  and  (6)  of 
Section  III.   Also  included  in  Section  III  is  a 
limited  investigation  of  the  "rule  of  nearest  neighbor" 
when  the  populations  are  assumed  to  be  exponential. 
Evaluation  of  the  performance  of  both  the  linear  discri- 
minant function  and  the  "rule  of  nearest  neighbor" 
was  accomplished  by  computation  of  the  probabilities  of 
misclassification. 

When  the  population  densities  are  assumed  to  be 
exponential,  Table  3  and  Table  4,  for  the  case  r  =  1, 
provide  a  means  of  comparing  the  performance  of  the 
linear  discriminant  function  and  the  "rule  of  nearest 
neighbor."  An  examination  of  these  tables  indicates  that 
both  procedures  can  result  in  "high"  probabilities  of 
error,  particularly  when   c   assumes  values  near  one, 
since  for  small  sample  sizes,  both  procedures  can  result 
in  error  probabilities  which  are  greater  than  £■   „ 
Although  even  as  n,  the  sample  size  from  each  popula- 
tion, tends  to  infinity,  the  linear  discriminant  func- 
tion has  error  probabilities  greater  than  ■£■   for 
[2(ln2)-l]  <  c  <  1,  it  is  of  interest  to  note  that 
"the  rule  of  nearest  neighbor"  in  this  situation  will 
always  have  error  probabilities  less  than  or  equal  to 
■§-.   Also,  depending  upon  the  importance  of  each  type  of 
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error,  it  is  possible  for  the  linear  discriminant  func- 
tion to  be  a  "fairly  useful"  procedure  since  one  error 
probability  is  usually  "small."   Table  4  also  shows 
that  as  r  increases,  the  probabilities  of  misclassifi- 
cation  decrease.   This  result  was  anticipated  since  for 
increasing  r,  the  gamma  distribution  approaches  the 
normal  distribution  by  the  Central  Limit  Theorem. 

The  following  recommendations  are  made  on  the  basis 
of  this  paper. 

(i)   Investigate  the  performance  of  the  nonparametric 
procedure,  using  k  =  3  instead  of  the  "rule  of 
nearest  neighbor,"  k  =  1. 
(ii)  Investigate  the  performance  of  the  nonpara- 
metric procedures  proposed  by  Hodges  and  Fix 
in  [2"1  employing  different  distance  functions, 
(iii)  Develop  a  more  satisfactory  computational 

formula  for  the  linear  discriminant  function 
when  the  populations  are  assumed  to  be  gamma 
in  the  situation  when  r  and  n  are  large  since 
the  formula  used  in  this  paper  required  many 
hours  of  computer  time, 
(iv)  Investigate  the  performance  of  the  linear 

discriminant  function  and  other  nonparametric 
procedures  for  other  distributions.   A  cursory 
investigation  was  made  for  the  beta  distribu- 
tion and  the  analysis  appears  to  be  more  diffi- 
cult . 
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(v)   Compare  the  performance  of  Bayesian  parametric 
and  nonparametric  classification  procedures. 

(vi)  Investigate  the  classification  problem  when 
there  are  more  than  two  populations,. 
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